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Coastal lagoons, both hypersaline and freshwater, are common, but still understudied ecosystems. We 
describe, for the first time, using high throughput sequencing, the extant microbiota of two large and 
representative Mediterranean coastal lagoons, the hypersaline Mar Menor, and the freshwater Albufera de 
Valencia, both located on the south eastern coast of Spain. We show there are considerable differences in the 
microbiota of both lagoons, in comparison to other marine and freshwater habitats. Importantly, a novel 
uncultured sulfur oxidizing Alphaproteobacteria was found to dominate bacterioplankton in the 
hypersaline Mar Menor. Also, in the latter prokaryotic cyanobacteria were almost exclusively comprised by 
Synechococcus and no Prochlorococcus was found. Remarkably, the microbial community in the 
freshwaters of the hypertrophic Albufera was completely in contrast to known freshwater systems, in that 
there was a near absence of well known and cosmopolitan groups of ultramicrobacteria namely Low GC 
Actinobacteria and the LD12 lineage of Alphaproteobacteria. 

Coastal lagoons are shallow water bodies separated from the sea by a barrier, connected at least intermit- 
tently to the sea by one or more restricted inlets, and usually oriented parallel to the shore. The formation 
of the barrier is crucial, as it allows lagoon waters to acquire significantly different characteristics com- 
pared to the nearby seawater. Mediterranean coastal lagoons commonly are not affected by significant tidal 
influences as tides in the Mediterranean Sea are very low. This avoids the diel inputs of seawater that are common 
in oceanic salt marshes. Because of the relatively increased isolation from the sea and their location within a 
hydrological catchment, these lagoons also become more susceptible to changes in salinity, dissolved oxygen, 
nutrient content, largely owing to the increased effect of evaporation in a restricted area, leading to increased 
salinity, and deposition of various salts (e.g. calcium carbonate), as well as to the strongest influence of the 
surrounding land. Because of the high population density in coastal Mediterranean areas, these lagoons are 
usually impacted by agricultural, mining, tourism and general developmental activities leading to the lagoon 
becoming a common sink for a wide variety of waste material 1 . These differences are also reflected in the 
organisms that inhabit these lagoons, which may largely contrast with those of the nearby marine environment. 
Such lagoons are very common environments in flat areas along the Mediterranean coasts, and may range from 
small to very large size. 

Albufera de Valencia (from its Arabic name »jj — »JI al-buhayra, "the little sea"), located a few Km south from 
the city of Valencia, Spain (39°19'54"N, 0°21'8"W) is a shallow (1 m on average) coastal lagoon which nowadays 
holds freshwater. It was originally a marine harbour that got progressively separated from the sea by a sand strip 
growing from North to South, due to the dominant marine currents and the deposition of river sediments, later 
becoming a large brackish coastal lagoon nearly 300 km 2 in size since the Roman times up till the 18 th century 1 . 
However, in the second half of the 19 th century, nearly 60% of the lake was filled up to reclaim land for the 
cultivation of rice, and this decline continued, and today the lake size has shrunk to —23 km 2 , nearly 15 times less 
than its original size 2 . During the course of this regression, freshwater inflow to the lagoon increased, owing to 
increased rice cultivation, and the development of irrigation within the catchment (917 km 2 ) croplands and today 
it is surrounded by —223 km 2 of rice fields that largely determine its hydrological functioning. Moreover, the 
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outgoing connection to the sea became controlled by hydraulic gates and 
inflow of sea water was totally stopped, leading to the complete conver- 
sion of a once brackish ecosystem into a freshwater one. Because of 
the increased human activities in its densely populated surroundings, 
Albufera collapsed as a macrophyte-dominated lagoon and turned into a 
highly hypertrophic ecosystem with very dense phytoplankton popula- 
tions primarily dominated by cyanobacteria 3 . 

The Mar Menor (also meaning "Little Sea" in Spanish) another 
huge coastal lagoon of nearly 135 km 2 in surface and somewhat 
triangular in shape, is among the largest lagoons in the Medi- 
terranean. Located in the region of Murcia (Spain, 37°43'08"N, 
00°47'14"W), it is separated from the sea by an extremely thin and 
nearly 20 km long strip of land (called La Manga), and is hypersaline 
(~5%). Mar Menor receives water from a number of sources, mainly 
small streams flowing into the lagoon (usually seasonal), run-off 
from mining activities, wastewater treatment plants overflows, agri- 
cultural land runoff and from urban development and tourist activ- 
ities 4 . This area of the Spanish coast is also among the most 
threatened by the rise in global sea levels 5 . Both Albufera and Mar 
Menor have received, and still partly receive substantial amounts of 
wastewaters that have led to a high trophic status of both water- 
bodies, though strongest inputs were historically received by 
Albufera, and led this lagoon to a extreme hypertrophic status and 
maintain a very high internal load of nutrients in its sediments in 
addition to external inputs 1 that support extensive algal growth. By 
studying these two systems, we cover the most representative eco- 
logical types of Mediterranean coastal lagoons, namely freshwater 
lakes dominated by continental hydrological processes like Albufera, 
and hypersaline lagoons maintaining wider connections with the sea 
but having higher salinity because of evaporative processes, like Mar 
Menor. Good examples of saline coastal lagoons are those located at 
the Languedoc-Roussillon region, in France, where a series of coastal 
lagoons mostly connected with the sea are widespread through the 
Mediterranean coast. Contrastingly, Albufera de Valencia probably 
represents the best example of a freshwater coastal lagoon in the 
Mediterranean. 

The marine environment has now been the focus of a large num- 
ber of metagenomic studies and they have already succeeded in 
providing us with a partial view of the organisms in this habitat 6-11 . 
The contribution of marine picoplanktonic cyanobacteria to global 
oxygen levels 12 , the superabundance of Candidatus Pelagibacter spe- 
cies 13 , vertical patterns in microbial diversity 10 and several other 
studies have served to illustrate the diversity and the of the marine 
microbial world. Moreover, salinity has been shown to be an extre- 
mely important factor in distribution of microbes 14 , much more than 
temperature and pH. Saline environments have been well studied, 
using several different methodologies, 16S rRNA genes, cultures and 
metagenomics approaches 15 22 . However, most saline environments 
examined so far have been primarily solar salterns, which are usually 
controlled environments, shielded from extraneous inputs (e.g. 
urban waste). Coastal lagoons, on the other hand, are more suscept- 
ible to vagaries of natural and human origin. Thus, in spite of the 
widespread nature and importance of hypersaline coastal lagoons, we 
know very little about the microbial composition of these ecosys- 
tems. The same also occurs for freshwater coastal lagoons, whose 
microbiota, compared to that of the open sea, has been scarcely 
studied, even less using modern metagenomic approaches. A lack 
of information is evident for the composition of the microbiota of 
Mediterranean coastal lagoons, but new generation sequencing 
methodology promises to offer a view of the microbial diversity 
and to unmask the environmental characteristics that would act as 
selective factors in determining the microbiota of the two main types 
of such ecosystems. This will also allow us to compare our metage- 
nomic description of the microbial component of the plankton, with 
a series of still relatively scarce metacommunity data from other 
aquatic ecosystems, both saline and freshwater. 



On the other hand, freshwater ecosystems, like the ubiquitous 
marine SARI 1 lineage in oceans, also have their own characteristic 
abundant microbes i.e. Low GC Actinobacteria 2325 , Betapro- 
teobacteria (e.g. Polynucleobacter 26 ' 29 ) and the LD12 clade of 
Alphaprotebacteria 30,31 (related to marine SAR11). These are usually 
threatened ecosystems, facing ever increasing pressure due to con- 
tinued human activities. Indeed, freshwater ecosystems (rivers and 
lakes together) comprise only 0.266% of all freshwater on earth 32 . In 
this small percentage of available freshwater, several factors influence 
microbial diversity in lakes e.g. trophic status 33,34 , pH 25-35 , landscape 36 
and water retention time 35 . While lagoon salinity is indeed a major 
driver of microbial diversity 37 , another important physical character- 
istic that is of importance is depth of the lagoons, as shallow water 
bodies have better light penetration, faster nutrient recycling and 
higher primary productivity. The average depth of Albufera is quite 
shallow (~1 m), while Mar Menor is comparatively deeper (~5 m). 
Though there has been much work on lakes using 16S rRNA and 
culture dependent approaches, recently, culture independent 
approaches have also begun to shed light on these systems 8,24,38 ' 39 . 
However, eutrophic freshwater lagoons have also not been studied 
using high throughput metagenomic sequencing, (indeed there are 
no metagenomic datasets from a eutrophic freshwater system yet) 
and given the biases in culture-dependent and culture-independent 
approaches 7,40 it is important to study these habitats using less biased 
methods. In addition, both these lagoons, Albufera and Mar Menor, 
are eutrophic systems 1,4 , especially the former. And both kinds, 
hypersaline and freshwater coastal lagoons are particularly wide- 
spread along the Mediterranean coast and indeed all over the world. 
We have very little information on the microbes of these habitats, 
and very few studies have been done with Mediterranean coastal 
lagoons in particular. Some studies have focused on cell counts e.g. 
for picocyanobacteria, in several coastal lagoons of the Adriatic 41 , 
and some others have been all 16S rRNA surveys, e.g for lagoons 
on the French Atlantic 42 and Mediterranean coasts 43 , and also for the 
large lagoon of Venice 44,45 , which is a particularly productive envir- 
onment, connected to the Adriatic Sea. 

As part of the Global Ocean Sampling expedition, we have used 
high-throughput metagenomic sequencing to investigate the micro- 
bial diversity of the Albufera and Mar Menor lagoons and compare 
them with other closely related aquatic environments. To compare 
with the hypersaline Mar Menor metagenome, we have chosen 
three metagenomic datasets from saline environments i.e. the 
Mediterranean Deep Chlorophyll Maximum (referred to as DCM), 
a marine metagenome 7 , of relevance for this case because the 
Mediterranean Sea is the primary water input into Mar Menor, 
and a metagenome from a very shallow hypersaline lagoon (salinity 
6%, depth 0.3 m), called Punta Cormoran (referred to as PC6) in 
Galapagos Islands 8 , as it is the only other hypersaline lagoon dataset 
available, and also because it is a relatively pristine environment. In 
addition, a 19% salinity dataset from a solar saltern (referred to as 
SS19) is also included as an extremely hypersaline environment 17 . 
For comparison with the freshwater Albufera dataset, we have 
included three metagenomes, two from lentic and another from a 
lotic habitat. The lentic datasets include Lake Lanier 46 (Georgia,US; 
forms the primary drinking water supply for the Atlanta metropol- 
itan area)and Lake Gatun 8 (located in the middle of the Panama 
Canal). The lotic metagenomic dataset is from the pristine upper 
water column of the Amazon River 24 . We compare the Mar Menor 
and Albufera metagenomes to these metagenomes in order unmask 
ecological factors that could be related to the microbial composition 
of their respective communities. 

Results 

Physico-chemical characteristics. The locations where the samples 
were taken are shown in Supplementary Figure SI. Some physico- 
chemical and biological properties of the samples are described in 
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Supplementary Table SI. Salinity, as represented by electrical 
conductivity, is much higher in Mar Menor than in the 
Mediterranean Sea. Albufera waters, however, appeared as highly 
mineralized freshwaters, showing a certain influence of the sea, 
with values of 2.8 mS cm -1 , as compared to freshwaters from the 
area, which commonly show conductivities of around 1 mS cm -1 . 
Even though Albufera is well separated from the sea, open inlets 
controlled by hydraulic gates sometimes allow some connection, 
and aquifers providing water to the lagoon are slightly influenced 
by marine waters. This demonstrates that we have chosen the two 
sides of the main environmental condition in determining the 
ecology of coastal lakes, this is, salinity, with both a highly saline 
and a freshwater lagoon. Similar to well mineralized waters, both 
samples were mildly alkaline (pH was 8.4 for Mar Menor and 7.69 
for Albufera), but alkalinity (and bicarbonate concentrations) in 
Albufera was lower compared to surrounding freshwater systems. 
The lower alkalinity in Albufera is mainly due to the high rates of 
planktonic primary production of such hypertrophic system that 
uses large amounts of inorganic carbon, thus decreasing the 
alkaline reserve mainly formed by bicarbonate. Saline content of 
Albufera, though much lower than that of Mar Menor, is quite 
balanced in anions between bicarbonate, chloride and sulphate, 
whereas that of Mar Menor is much higher, and mostly due to 
chloride. These data indicate the differences in relative importance 
of continental and marine inputs in these two systems. 

Total nitrogen (TN) and total phosphorus (TP) concentrations, 
taken together with chlorophyll concentrations, better reflect the 
extent and effects of eutrophication on both lagoons, as they show 
the amount of nutrients that are incorporated into biomass, mainly 
phytoplankton, in the form of particulate nutrients. Both TN and TP 
were around two and a half times higher in Albufera than in Mar 
Menor, showing that, in addition to salinity, these two systems also 
maintain a large difference in another quite important envir- 
onmental feature, namely, the trophic status. 

Chlorophyll-a concentration reveals even higher differences than 
TN and TP, as the chlorophyll levels of Mar Menor (3.94 ug/1) are 
actually very similar to that of the DCM of the Mediterranean 7 (3.4 
ug/1), while Albufera displayed levels corresponding to extremely 
hypertrophic conditions (271.31 ug/1). Following OECD criteria 47 , 
all these values categorized Albufera as a hypertrophic system, 
whereas those of Mar Menor correspond to a mesotrophic system 
but with a strong trend towards eutrophication as indicated by con- 
centrations of soluble nutrients. Remarkably, when considering both 
nitrogen and phosphorus, these nutrients are mostly included within 
the particulate fraction in the Albufera, with comparatively low 
amounts in the soluble forms of phosphorus (soluble reactive phos- 
phorus, mainly orthophosphate) and, even lower, of nitrogen 
(ammonia), compared to overall amounts that are mainly owed to 
the biomass of phytoplankton, as shown by Chl-a concentrations. 
Because of its long residence time during most of the year, Albufera 
acts as a bioreactor that converts most of the incoming nutrients in 
phytoplankton biomass, most of which is later retained in the sedi- 
ments and represents a strong internal load that further supports 
hypertrophic conditions. Moreover, most of this phytoplankton bio- 
mass is composed by cyanobacteria, as shown by the dominance of 
taxa-specific carotenoids (Supplementary Table SI) from these phy- 
toplanktonic organisms, such as zeaxanthin, as was further con- 
firmed by microscopic and molecular analyses. Contrastingly, most 
nitrogen and phosphorus in Mar Menor was detected as soluble 
forms, with relatively low levels of phytoplankton biomass that are 
still comparable to productive areas of the sea, such as the DCM, but 
much poorer compared to Albufera. Ammonium is, in contrast to 
Albufera, the main form of nitrogen in the waters of Mar Menor. The 
very high planktonic biomass in Albufera quickly assimilates avail- 
able nutrients, especially those which are limiting, and ammonium is 
the preferred form of nitrogen to be assimilated by organisms as it 



has the same redox status than organic nitrogen. In Mar Menor, 
however, the high availability of soluble (biologically available) forms 
of nitrogen and phosphorus compared to the low chlorophyll levels 
indicates the occurrence of recent peaks of nutrient inputs into this 
lagoon, occurring briefly before the sampling, that have not yet had 
the time to be converted into biomass. Massive occasional nutrient 
inputs are a common feature of this lagoon and are associated to 
time-restricted discharges of wastewaters 48 or increased agricultural 
runoff linked to heavy rains. These inputs commonly cause algal 
blooms that are associated with such nutrient dynamics 4 . Recent 
modelling estimated that, only accounting from agriculture sources 
associated to irrigations procedures, more than 2000 tonnes of nitro- 
gen and around 60 tonnes of phosphorus enter per year in the Mar 
Menor, which, together with other sources, such as urban waste- 
waters, explain the high levels of soluble nitrogen found in this 
lagoon. Additionally to this modeling, previous empirical evidence 
of the high amounts of nutrients received by Mar Menor was given by 
Velasco et al 49 , who during a hydrological cycle measured nutrient 
inputs as high as 2010 tones of inorganic nitrogen and 178 tonnes of 
soluble reactive (biologically available) phosphorus in a year. Thus, 
our measurements of dissolved inorganic nitrogen, even if chloro- 
phyll concentrations are not so high, reveal a relatively high (meso- 
trophic to eutrophic) trophic status of Mar Menor compared to the 
coastal waters of the nearby Mediterranean Sea, though much lower 
than that of Albufera, where nutrients are likely quickly bioconverted 
into phytoplankton biomass. 

Phytoplankton diversity and abundance. In contrast to the very 
different abundance of phytoplankton (quantified as Chl-a con- 
centration), both systems showed similar densities of heterotrophic 
bacterioplankton (in the range of 4-5 10 6 cells per ml), higher than 
those commonly found in surface waters of the Mediterranean 
Sea 50-51 . However the abundance of phototrophic picoplankton, 
mainly unicellular Synechococcus-like cyanobacterial cells, was 
almost twenty times higher in La Albufera than in Mar Menor. 
These autotrophic picoplankton (APP) cells are similar to those of 
surface waters, phycocyanin-rich cells mostly lacking 
phycoerythrin 52 . However, although APP abundance is much 
higher in Albufera, they represented up to 9.4 % of phytoplankton 
biomass (biovolume) in Mar Menor. This contribution was 3.3 % in 
Albufera, where filamentous cyanobacteria, diatoms and chlo- 
rophytes accounted for most of the biomass (Supplementary 
Figure S2). The relatively high diversity of phytoplankton in 
Albufera (Figure 1, Supplementary Figure S3) revealed by our 
sampling is a relative novelty in this lake within the last years 
associated with sewage diversion 53 compared to the previous de- 
cades, when filamentous cyanobacteria, like Planktothrix agardhii, 
Pseudanabaena galeata and Geitlerinema sp. widely dominated the 
community 54 . This relatively high diversity related to increased 
relevance of chlorophytes and diatoms compared to cyanobacteria 
is also shown by taxa-specific pigments. In addition to the high 
concentrations of the cyanobacterial-specific carotenoid zeaxan- 
thin, high concentrations of the diatom-marker carotenoid fuco- 
xanthin were also found (Supplementary Table SI). The high 
contribution of chlorophytes in terms of total phytoplankton 
biomass, mostly due to the presence of very big colonial species of 
Pediastrum (P. boryanum and P. duplex), which at the time of 
sampling accounted for 46.6% of total phytoplanktonic biovolume 
(Figure 1; Supplementary Figure S2) but only for 1.3 % of 
phytoplankton individuals, is likely the reason that chlorophyte- 
specific carotenoids are not so abundant. Sewage diversion, 
together with increased flushing during some periods associated to 
rice cultivation, sometimes promotes clear water phases in late 
winter and spring, as it occurred in 2010, when sampling was 
performed and the more evident clear water phase has been 
reported for the last four decades. Contrastingly, Dinoflagellates 
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Figure 1 | Pairs of microphotographs, DAPI stain (blue, up) and photosynthetic pigment autofluorescence (red, down) of samples from Albufera (A 
and B) and Mar Menor (C and D) showing different microorganisms. A) a colony of unicellular picocyanobacteria B) several filamentous cyanobacteria 
and coenobia of the chlorophytes Pediastrum sp. and Scenedesmus sp. C) Different morphologies of heterotrophic bacterioplankton (cells not showing red 
autofluorescence in lower pictures) and autotrophic picocyanobacteria (cells showing red autofluorescence in lower pictures). D) Heterotrophic 
bacterioplankton and autotrophic picocyanobacteria with a eukaryotic nanoflagellate. White bar corresponds to 10 um in all pictures. 



dominated by far phytoplankton in Mar Menor, both in terms of total 
phytoplankton biomass and number of cells (excluding APP for the 
later count), with also relevant contributions of diatoms and 
unicellular picocyanobacteria (Figure 1; Supplementary Figure S2). 
These are also reflected in the abundance of the taxa-specific 
carotenoids (Supplementary Table S2), which, although at much 
lower concentrations than those of Albufera, also shows the relative 
importance of the dominant phytoplankton groups. Neither Albufera 
nor Mar Menor hold planktonic anoxygenic phototrophic bacteria, as 
revealed by the absence of bacteriochlorophylls. 

GC Content. We obtained nearly equal amount of sequence data 
from each one three different filter sizes for each dataset (0.1, 0.8 and 
3.0 um, See Supplementary Table S2). The sequence data from the 
three filters of Mar Menor shows some differences in GC content 
(Supplementary Figure S4), with the 3 p.m filter showing a high GC 
peak, likely because of the increased number of eukaryotic sequences 
captured in this filter. A comparison of the GC content of the two 
smaller filter sizes (0.1 and 0.8um) with other available marine and 
hypersaline metagenomes (DCM, PC6 and SSI 9) is shown in 
Supplementary Figure S4. The Mar Menor metagenome shows a 
single distinct peak at —50%, similar to the marine metagenome in 
being unimodal, but of very different GC% and a broader GC range, 
and also distinct from the other hypersaline datasets which have clear 
bimodal GC distributions (both PC6 and SS19). All the hypersaline 
metagenomes do have at least a single peak at around 50% GC. The 
figure indicates that across a range of salinities, (from 3.5% to 19%) a 
diverse range of GC content may be found. Moreover, Mar Menor 
GC distribution appears to be quite different from PC6, although 
both habitats have nearly identical salinity (however, as no other 
physical-chemical data is available for PC6 dataset apart from 
salinity so the factors relating to these differences cannot be 
adequately discussed). There does not appear to be an abundance 
of very high GC organisms (—70% GC as in PC6) in Mar Menor 
(Supplementary Figure S4). On the other hand, sequences from all 
three filters of Albufera tended to show a GC profile skewed towards 
high GC content (Supplementary Figure S4). Comparison to three 
other freshwater datasets (Lake Gatun in Panama, Lake Lanier in 
Atlanta,US 46 , and the River Amazon 24 ) (Supplementary Figure S4), 
does not show any kind of clear pattern, apart from a low GC peak 
(—45-50%) in all datasets except Albufera. So in this initial 



examination, the GC% profiles of both Mar Menor and Albufera 
appear quite different from other metagenomic datasets, and this 
already is an indication of the different communities in these 
ecosystems compared to other related available datasets. 

Community Structure. Among prokaryotes, the results of 
classification of the 16S rRNA sequences and all reads comparison 
to the NR database indicated almost exclusively the presence of 
Bacteria (Supplementary Figures S5, S6 and S7; Supplementary 
Tables S3, S4, S5 and S6). No archaeal 16S rRNA reads were 
detected in the Mar Menor dataset, and an extremely low number 
(<1%, n=322) of all metagenomic reads could be assigned to 
Archaea in Albufera. This extremely low fraction of reads from 
Albufera was assigned primarily to Euryarchaeota. This is indeed a 
little unusual, as Archaea are usually at least minor components of 
most systems (with exceptions, e.g. solar saltern crystallizer ponds), 
typically in the range of 5-10% 24 but in Mar Menor we have barely 
detectable levels of archaeal sequences. 

Phages. Among the most abundant organisms recruiting the max- 
imum number of reads from the Mar Menor metagenome was a viral 
genome, that of Roseobacter phage SIOl. (see Supplementary Table 
S3). Roseophages are lytic podoviruses of Roseobacters, first isolated 
for Roseobacter SI067, an aerobic, heterotrophic alphaproteobacter- 
ium. The currently sequenced Roseophages have been isolated from 
California near-shore locations. Comparative genome analysis of 
Roseophages has revealed largely conserved genomes, with three 
distinct pockets of variability (thyX gene, phosphage metabolism 
genes and structural genes like the tail-fiber protein). However, our 
sequence data indicates the presence of a population of organisms 
belonging to the order Rhodobacterales (see 16S rRNA section 
above). The average %identity of the metagenomic hits mapping to 
the Roseobacter genome was —40%, i.e. rather low, so the dominant 
phage might be an abundant podovirus, similar to Roseophages, but 
its host specificity is as yet uncertain, as the host itself is as yet 
undescribed. In comparison to the nearly 11% reads in Mar Menor 
metagenome being assigned to phages, only —3.6% reads could be 
assigned to phages in Albufera. Even then, a phage genome, 
Prochlorococcus phage P-SSM2 appeared as a genome that recruited 
several hits in Albufera (Supplementary Table S4). P-SSM2 is a myo- 
virus, that is specific for cross-infections between Prochlorococcus 
strains 55 . However, there is no Prochlorococcus population in 
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Cross comparison of comparative distribution 16S rRNA sequences from selected abundant high level bacterial taxa from Mar Menor and 
metagenomes to several freshwater and saline metagenomes. Results from all filters have been combined. 



Albufera, so these reads likely belong to an abundant myovirus, 
which might be infecting the abundant Synechococcus or even 
Cyanobium. 

Alphaproteobacteria. Alphaproteobacteria form a large part (~33%) 
of the community in Mar Menor (Figure 2), similar to the DCM. The 
marine metagenome of the DCM is dominated by Candidatus 
Pelagibacter (belonging to the SAR11 cluster). In Mar Menor as well, 
the majority of alphaproteobacterial 16S sequences could be ascribed 
to the SARI 1 cluster, and nearly half (43.6%, n=69) of all 16S reads 
to which we could assign a tentative genus could be affiliated to 
Candidatus Pelagibacter (Supplementary Table S5). Moreover, 
alphaproteobacterium HIMB114, also a member of the SARI 1 cluster 
(a marine microbe, isolated from Hawaii) was among the organisms 
that recruited the maximum number of reads from the metagenome 
(Supplementary Table S3). These results point towards the abund- 
ance of a SARI 1 representative in Mar Menor. In addition to these 
organisms (both belonging to the order Rickettsiales), a number of 
hits were classified into the order Rhodobacterales (—30% of all 
alphaproteobacterial reads), that are known to comprise, among 
others, abundant microbes (e.g. Marivita, Cetrimonas, Roseisalinus, 
Roseovarius were identified). Only a small number of reads were 
classified into the order Rhizobiales (—10% of all alphaproteobacter- 
ial reads). 

Contrastingly, in the similarly hypersaline lagoon of Punta 
Cormoran, which has a similar high percentage of alphaproteobac- 
terial reads, the SARI 1 clade does not appear to have any abundant 
representatives, with only a very small minority of reads assigned to 
Candidatus Pelagibacter. the major taxa belonged to the order 
Rhodobacterales (e.g. Dinoroseobacter, Roseovarius, Loktanella etc) 
and to a lesser extent, Rhizobiales (rhizobacteria) (e.g Parvibaculum, 
Mesorhizobium etc.) 17 . 

So it appears, firstly, that an as yet unknown but abundant SARI 1 
cluster representative inhabits the hypersaline lagoon of Mar Menor 
supported by the recruitment plots of both Candidatus Pelagibacter 
and Alphaproteobacteria HIMB114 (Figure 3). Secondly, the 
Alphaproteobacteria inhabiting two hypersaline lagoons of similar 
salinity, are substantially different. Punta Cormoran, the pristine 
lagoon appears to have a thriving .Roseofoarfer-community compared 



to Mar Menor that has both SAR11 representatives and Roseobacter 
species. 

In contrast to Mar Menor, in the freshwater Albufera, the 
Alphaproteobacteria are in a minority (—9% of all reads). This is 
surprising as they are usually detectable across a range of freshwater 
bodies 31 . Based on 16S rRNA phylogenetic analyses, freshwater 
Alphaproteobacteria have been divided into a number of different 
lineages, called alfl, alfll, alflll, alflV, alfV (LD12 sister group to 
SAR11), alfVI and alfVII 31 ' 56 . In general, freshwater Alphapro- 
teobacteria are not a well studied group, and we have very little 
information regarding their ecology, functional roles or genomic char- 
acteristics. However, the freshwater datasets chosen here do show 
clearly that there appears to be a wide variation in the abundance 
and occurrence of the freshwater alphaproteobacterial lineages 
(Supplementary Figure S8), particularly the complete absence of the 
LD12 clade in Albufera, which could mean that LD12 distribution 
might be affected by nutrient status, as supported by our results. 

Cyanobacteria. Cyanobacteria form a sizeable percentage of the mar- 
ine microbial community, especially the deep chlorophyll max- 
imum 7 have been shown to progressively decrease in numbers with 
increasing salinity 17 . Here also, we can clearly see (Figure 2) that the 
number of cyanobacterial sequences shows a decline from the mar- 
ine, to 5% salinity and finally nearly absent at 19% salinity. The total 
percentage of cyanobacteria identifiable in the Mar Menor dataset is 
similar to that of the marine metagenome of the DCM, and nearly 
twice that of Punta Cormoran. The top organisms identified as 
cyanobacterial were only Synechococcus strains (e.g. WH7803, 
WH7805), which have been identified before in hypersaline habitats, 
both by 16S rRNA cloning studies 57 and by metagenomic analyses 17 . 
Comparisons of the metagenomic reads against Synechococcus gen- 
omes show a very high level of fragment recruitment (Figure 3), 
indicating close relatedness between the free-living cyanobacteria 
in Mar Menor to the already sequenced strains. However, in com- 
parison to the DCM, where the cyanobacterial population comprises 
both Prochlorococcus and Synechococcus, it appears that among 
free-living unicellular picocyanobacteria, Synechococcus alone 
contributes to the primary productivity of this system, where it 
accounts for —10% of phytoplankton biomass (Figure 2), and the 
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Figure 3 | Fragment recruitment plots of selected organisms versus the Mar Menor and the Albufera metagenomes. The comparisons were done using 
BLASTN, and a minimum length of 50 bp and an evalue of le-5 was considered a hit. The X-axis is scaled in Mb and the Y-axis shows the %identity. 



range of Prochlorococcus does not extend into the high salinity 
waters of Mar Menor. Indeed, at higher taxonomic levels, there 
appears to be very little difference between Mar Menor and the 
DCM, e.g. similar levels of Alphaproteobacteria, Cyanobacteria, 
Gammaproteobacteria, Bacteroidetes. Differences appear to emerge 
at the organismal levels, e.g. absence of Prochlorococcus in Mar 
Menor, a heterogenous alphaproteobacterial population etc. 

The most striking characteristic of the Lake Albufera is clearly the 
high-abundance of cyanobacterial sequences (comprising nearly 
35% of all 16S rRNA sequences, and nearly 23% of all metagenomic 
reads, see Figure 2 and Supplementary Figure S7). Albufera exhibits a 
highly hypertrophic status, which makes a difference with other 
freshwater bodies previously studied, like the Amazon river, Lake 
Gatun and Lake Lanier (or even from the other saline/hypersaline 
datasets), which do not display such cyanobacterial abundances. 
Both the Amazon and Lake Gatun show only a very small percen- 
tage of cyanobacteria (<2%), while Lake Lanier appears to have a 
little more (—6%). In comparison to Mar Menor, where we were 
able to identify mainly Synechococcus, the diversity of cyanobacteria 
in Albufera is clearly higher, with a number of different, and 
abundant genera, e.g. Synechococcus, Cyanobium, Pseudanabaena, 
Merismopedia, all of which have been previously isolated from fresh- 
waters and mostly detected in this lake 3 . 



Microscopic counts using the Utermohl sedimentation technique 
on inverted microscope (Supplementary Figure S2) are useful for 
distinguishing morphologically different species of a certain size, 
mainly ranging from nanoplankton to bigger planktonic microor- 
ganisms, including filamentous cyanobacteria and eukaryotic algae. 
This does not apply for picocyanobacteria, such as Cyanobium and 
Synechococcus, which jointly accounted for up to 48 % of the 16S 
rRNA sequences assigned to a genus in samples from Albufera 
(Supplementary Table S6). Larger cyanobacteria, like colonial forms 
of genus Merismopedia or filamentous species, like Pseudanabaena, 
were detectable both from sequencing techniques (Supplementary 
Table S5) and by microscopy (Supplementary Figure S3), showing a 
partial agreement in both methods. 

Even though the measured chlorophyll a levels in Albufera are far 
higher (271.31 (ig/1) than Mar Menor (3.94 ug/1), the difference in the 
percentage of cyanobacteria (by 16S rRNA analysis) is not propor- 
tionately larger (—35% in Albufera and — 12% in Mar Menor). This is 
likely due to the presence of an enormous diversity and abundance of 
eukaryotic photosynthetic algae in Albufera, that are not very 
well detected by sequencing due to much larger eukaryotic genome 
size but are identified clearly under the microscope (Figure 1, 
Supplementary Figure S3). Similarly to Mar Menor, we found no 
evidence for presence of Prochlorococcus in the metagenomic data 
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Figure 4 | PCA of tetranucleotide frequencies of assembled contigs from 

consistent phylogenetic profile are shown (see methods). 

from Albufera although Prochlorococcus-like populations have been 
reported in freshwater systems before 58 , and a study on Yellowstone 
Lake has also detected Prochlorococcus ecotypes in freshwater 59 . Also, 
even though we detected small amounts of chlorophyll b (Supple- 
mentary Table SI), and Prochlorococcus cells have characteristic divi- 
nyl derivatives of chlorophyll a and b, this may most likely be attrib- 
uted to chlorophytes that also have these pigments and were 
identified as very abundant by microscopic counts (Supplementary 
Figure S2). Another indication of the relative homogeneity of the 
cyanobacterial populations in Mar Menor, compared to Albufera, 
is also visible in the GC% profile of the cyanobacterial reads 
(Supplementary Figure S9), i.e. a single peak at —62%, while in 
Albufera, there are two distinct peaks (one at —55% and the other 
at -62%). 

Verrucomicrobia. This group of microbes is another point of differ- 
ence between the pristine Punta Cormoran and Mar Menor. 
Verrucomicrobiae are widely distributed and have been isolated 
from a number of different habitats, e.g. soils, lakes, marine sedi- 
ments, hot springs, and even in man-made ecosystems like acid- rock 
drainage and municipal solid-waste landfill leachates 60 . They are 
recognized as an increasingly significant group of soil bacteria and 
according to several estimates may comprise up to 10% of total 
bacteria in soil 60 . In Mar Menor we find that a Coraliomargarita 
akajimensis (isolated from seawater 61 ) related microbe is quite 
abundant (Supplementary Table S3). Another abundant organism 
(by 16S rRNA) was Haloferula, which lacks a sequenced genome. 
However, Haloferula species have been isolated from marine envir- 
onments 62 so it is likely these are close relatives. Moreover, it is clear 
from Figure 2, that the Verrucomicrobia are abundant in Albufera as 
well. However, here instead of Coraliomargarita or Haloferula 
(which appear to be more salt-tolerant), there is a Chthionibacter 63 
(which was isolated from soil) related Verrrucomicrobia. 



,ue HTCC1062 




PC2 
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Mar Menor and Albufera. Only those contigs longer than 2 kb that had a 



Actinobacteria. Actinobacteria have been primarily thought of as soil 
bacteria. This can be attributed to the ease of cultivation of this group, 
which have been referred to as high GC microbes. However, several 
studies, using different approaches (16S rRNA, FISH, and metage- 
nomics) have shown now that Actinobacteria are very common and 
abundant members of freshwater communities 2430,56,64,65 , and many 
are not even high GC 17,23,24 . The abundance of Actinobacteria varies 
greatly across the datasets (Figure 2). In the Albufera metagenome we 
were able to identify as corresponding to this group only 5-6% of 
reads (by 16S, and all reads). This is an extremely low percentage 
relative to the other datasets, (e.g. Amazon —20%, Lake Gatun 
—40%, and Lake Lanier —20%). This reduced relevance of 
Actinobacteria is indeed striking. 

Some saline datasets also show an abundant actinobacterial pres- 
ence, e.g. —24% of all 16S rRNA reads in the hypersaline lagoon 
Punta Cormoran are actinobacterial. This is in sharp contrast to 
the very low numbers in the DCM (—2%) or Mar Menor and SS19 
(—5% each). In addition, most of the actinobacterial reads from the 
Mar Menor metagenome were high GC (Supplementary Figure S10) 
while those from Albufera showed three clear GC% peaks, indicating 
that in spite of the low number of actinobacterial reads, there might 
be at least three different clades of Actinobacteria present here. 

We examined the 16S rRNA actinobacteril reads from all these 
datasets in the framework of a well-defined taxonomy (Figure 5), 
which the freshwater taxa have been classified into seven lineages 
(—10-15% identical in 16S rRNA to each other) 31 . Each lineage is 
subclassified into clades (> = 95% identity to at least one member), 
and clades into tribes (> = 97% identity to at least one member). The 
results of this classification show the variation in abundance of these 
lineages across all the datasets. However, apart from these differ- 
ences, it is very difficult to arrive at more conclusions as there is 
not even a single sequenced representative yet from low GC 
Actinobacteria. 
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Figure 5 | Classification of actinobacterial 16S Reads from Albufera, Mar Menor and several other metagenomic datasets into known lineages of 
freshwater Actinobacteria. The numbers above the bars indicate the total number of actinobacterial 16S sequences detected in each dataset. 



Both Mar Menor and Albufera contain very similar percentage of 
actinobacterial reads. However, they differ in the type of their res- 
ident Actinobacteria. The majority of reads in Mar Menor could be 
affiliated to the Lunal, Luna3, acIII and acIV lineages. Albufera also 
has acIII and acIV, Luna3 lineage is absent, and several others are 
present in small numbers. The Amazon River and Lake Gatun show 
very similar populations, with acl-C clade being the most abundant. 
Lake Lanier is quite different from both of these and has acl- A as the 
dominant clade. But Albufera is drastically different from any of the 
other freshwater datasets, with the acl lineage completely absent. 
Instead, acIII and acIV are nearly equally dominant. The saline sam- 
ples also show a different trend. Very few reads are detected in the 
DCM so these might not be very reliable, but Mar Menor and PC6 
actually appear quite similar in their actinobacterial load, apart from 
the extra presence of the acSTL lineage in PC6. However, one of the 
most striking results is the total dominance of the Lunal lineage 
(previously called acll), in the SS19 dataset. It is also present in 
significant amounts in Mar Menor and Punta Cormoran. Broadly 
however, the data clearly show the separation between the various 
lineages on grounds of salinity. For example, the acl lineage, without 
doubt among the most abundant freshwater lineages, is restricted to 
freshwater alone, and is not available in saline habitats. However, 
none of these lineages have a sequenced representative yet, so we 
cannot speculate further on the nature of these differences. 

Betaproteobacteria. Betaproteobacteria are among the most dom- 
inant taxa in freshwater systems. This has been shown by several 
approaches (16S rRNA, FISH, metagenomics) 24,64,66 . In simple 
abundance levels, in comparison to Albufera(~6%) with other fresh- 
water datasets, only Lake Gatun has a similar abundance levels of 
betaproteobacteria(~3%), while the Amazon and Lake Lanier 
appear to have very high levels (nearly 20%) (Figure 2). Although 
betapro- 
teobacteria are detectable in Albufera, the most prominent, nearly 
universally available, and arguably the best studied freshwater beta- 



proteobacteria, Polynucleobacter, was conspicuous by its absence. 
Moreover, only a handful of betaproteobacterial 16S rRNA 
sequences could be affiliated to known genera (Methylibium-3 and 
Thauera-l) while the others were all unclassified betaproteobacteria. 
Comparisons from using all reads did suggest that nearly 6% of all 
sequences in Albufera were betaproteobacterial (Supplementary 
Figure S7). The ubiquity and absence of Polynucleobacter across a 
wide variety of lakes of different characteristics (altitude, pH, water 
chemistry, landscape position, trophic status etc) has been discussed 
extensively 28 before and it has been suggested that high levels of 
dissolved organic carbon are negatively correlated to the abundance 
of this microbe. In Albufera, we detected very high values of dissolved 
organic matter (data not shown), and it is likely that this factor is 
important in the absence of this ubiquitous microbe in this habitat. 

Freshwater betaproteobacteria are broadly divided into seven 
lineages (betl to bet VII) based on 16S rRNA phylogenetic analyses 31 . 
We classified the betaproteobacterial reads from several datasets into 
these lineages (Supplementary Figure SI 1 ). Both the Amazon and 
Lake Lanier both showed a wide variety of lineages, and with nearly 
equal amounts of betl lineage. Indeed, betl lineage does appear to be 
nearly universal across all freshwater datasets. This lineage does have 
some cultured representatives (e.g. Limnohabitans 67 '^). However, 
some lineages of betaproteobacteria found in both are different, 
e.g. betlll (order Burkholderiales) is dominant in Lake Lanier and 
betIV (order Methylophilales) in the Amazon. 

The betll lineage, to which Polynucleobacter belongs, is seen only 
in two datasets (Amazon and Lake Lanier). Albufera also contained 
sequences belonging to the betVI lineage (—50%). In Albufera, the 
betaproteobacterial sequences appear nearly evenly divided between 
the betl and the betVI lineages, with a small amount of betIV 
sequences. However, the betl and the betVI lineages appear to be 
widely distributed in all freshwater datasets. But apart from this, 
there does not appear to be any kind of simple commonality regard- 
ing the distribution of the lineages within the datasets studied, 
with each dataset having its own characteristic features. More 
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metagenomic datasets complemented with environmental data will 
be required to elucidate more clearly the various reasons for distri- 
bution of these lineages. 

We also examined in more detail the distribution of 
Polynucleobacter specifically in all the metagenomic datasets com- 
pared in this study. In the betll lineage there are four different 
"tribes" namely PnecA, B C and D, named after Polynucleobacter 31 . 
The four tribes refer to different Polynucleobacter species i.e. PnecA, 
B, C D refer to P. rarus, P. acidiphobus, P. necessarius and P. cosmo- 
politanus respectively. Only the Amazon and the Lake Lanier data- 
sets showed evidence of presence of Polynucleobacter. However, both 
are quite enriched in Polynucleobacter (betll lineage). More specif- 
ically, all four tribes PnecA, B, C and D were identified in the Amazon 
dataset, while only PnecB was identified in the Lake Lanier dataset. 
We could not identify any Pnec 16S sequences in any of the other 
datasets. 

It is clear however, that betaproteobacteria are not numerically 
abundant in saline waters, e.g. in Mar Menor, only about —1% of 
all the reads could be assigned to betaproteobacteria (Supplementary 
Figure S7). They are at similarly low levels in the Deep Chlorophyll 
Maximum, Punta Cormoran and SSI 9 datasets as well. This is in 
concordance with similar results regarding the low abundance of 
betaproteobacteria in marine metagenomic datasets that have been 
obtained before 8 . 

Eukaryotes. From the collected metagenomic data it is possible to 
identify eukaryotic sequences —12% Mar Menor, —2% Albufera 
from comparison to the complete NR database. Indeed, the number 
of eukaryotic reads increased progressively with increasing filter size 
(Supplementary Figure S5). The total number of 18S sequences iden- 
tified in Mar Menor and Albufera were 28 (—5% of total SSUs) and 
22 (—5% of total SSUs) respectively. The main eukaryote identified 
in Mar Menor was Alexandrium (—18%, n=5) a marine armored 
dinoflagellate that produces neurotoxins that cause paralytic shellfish 
poisoning. Alexandrium is well known in coastal lagoons in the 
Mediterranean 69 and has both autotrophic and heterotrophic species. 
Alexandrium blooms are harmful and are famously referred to as red 
tides. The toxins it produces can have adverse effects when consumed 
by humans, usually in the form of contaminated seafood (shellfish, 
fish etc) 70 . Moreover, these blooms are common in coastal habitats, 
and affect marine trophic structure, increase mortality of marine 
fish, birds and mammals and disrupt recreational activities™. 
Dinoflagellate blooms are usually correlated with increased levels 
of reduced nitrogen sources, particularly ammonia and urea (at least 
for Alexandrium) 71 . Photosynthetic dinoflagellates can supplement 
photosynthetic growth by organic sources and the increase in the 
levels of inorganic nutrients (particularly nitrogen and phos- 
phorus) 72 , coupled by their ability to produced paralyzing toxins 
make them strong competitors in eutrophic systems, affecting multi- 
cellular and unicellular life alike 73 . However, toxin production by 
Alexandrium is inconsistent, and not all species are toxic. 
Additionally to Alexandrium, other dinoflagellates were also 
detected (e.g. Gymnodinium, Protoceratium). 

Another abundant organism present by 18S rRNA in Mar Menor 
was Chrysochromulina (n=4), which is a haptophyte from the class 
Prymnesiophyceae. Haptophytes (e.g Chrysochromulina, Pha- 
eocystis, Prymnesium), are all bloom forming organisms. The par- 
ticular feature of haptophytes is the presence of a haptonema, a 
flagella-like (though only superficially), retractile, coiled protuber- 
ance, performing several functions (e.g. sensory responses, prey cap- 
ture) 74 . Chrysochromulina is also photosynthetic, and (like some 
Alexandrium species), can supplement photosynthetic growth by 
mixotrophic feeding. Indeed, some Chrysochromulina species are 
actually euryhaline as well, with a much higher level of optimum 
salinity for growth 75 than marine levels. 

In a microscopic examination and enumeration of the planktonic 
species, we were able to identify a number of abundant diatoms (e.g. 



Cyclotella, Entomoneis, Nitszchia). Cyclotella was identified by its 18S 
rRNA sequence in the metagenomic data as well. It is a well known 
abundant centric diatom. Some Cyclotella species are known to be 
associated with high nutrient concentrations, particularly phos- 
phorus, and thus are actually associated with polluted, eutrophic 
waters 76,77 . However, the most abundant organism by far identified 
by microscopy was a dinophyte Gyrodinium. In contrast with dia- 
toms, whose main sequences (Cyclotella sp.) corresponded to taxa 
already identified by microscopic observations, molecular identifica- 
tions of dinoflagellates (dinophytes) did not coincided with micro- 
scopic determinations, which demonstrates that taxonomy of this 
group is yet far to be elucidated, even though in our microscopic 
determinations we only considered autrotrophic or mixotrophic spe- 
cies which hold chloroplasts. 

Apart from protists, crustacean copepods, that are among the 
most important group of marine invertebrates 78 , particularly for 
the carbon flux in the food web of the oceans 79 , were identified as 
well in Mar Menor (e.g. Paracyclopina, Oithona, Diarthrodes). These 
can be considered the zooplanktonic community in the lagoon. 
Parcyclopina species can be found in brackish waters but are tolerant 
to high salinities as well 80 . 

The planktonic community of protists and zooplankton in 
Albufera was clearly different from the dinoflagellate and copepod 
dominated Mar Menor. The 18S rRNA sequences from Albufera 
could be assigned primarily to diatoms (e.g. Cymbella, Nistzchia, 
Sellaphora), that also matched with microscopic determinations 
(Supplementary Figure S2), or ciliates (e.g. Halteria, Strombidium). 
While microscopic observations confirmed the presence of Nitszchia 
and Cyclotella, the vast majority of organisms in the sample were 
identified as the filamentous or colonial nonoplanktonic cyanobac- 
teria (prokaryotes), primarily Merismopedia, which form a dense 
layer of loosely arranged cells in a somewhat planar (rectangular or 
square) topology, sometimes enclosed by a mucilaginous matrix. 
Merismopedia is commonly found floating in freshwater, several 
species are planktonic, and can also be found in somewhat halophilic 
habitats (e.g. coastal areas) or even in thermal springs. They are 
actually distributed all over the world 81 . In addition to the abundant 
cyanobacterial (prokaryotic) taxa, several type of chlorophytes (eg. 
species of Pediastrum, which accounted for a big portion of 
the phytoplankton biovolume, Coenochloris, Chlamydomonas, 
Tetraedron, Scenedesmus, etc) were detected. The photosynthetic 
organisms in Albufera clearly dwarfed those available in Mar 
Menor, both in sheer numbers and also in diversity. 

Rhodopsins. We identified 52 rhodopsin sequences in the Mar 
Menor dataset and 34 in Albufera. In Mar Menor, though 
Firmicutes represented less than 1% of the classified sequences, 10 
sequences (nearly 20%) of all rhodopsin sequences appeared related 
to firmicute rhodopsins (Exiguobacterium sp.). Nearly all other 
sequences in Mar Menor were related to proteobacterial rho- 
dopsins (primarily a collection of Alphaproteobacteria and Gam- 
maproteobacteria). In Albufera, the phylogenetic distribution of 
rhodopsins appeared more diverse , with the majority affiliated to 
Proteobacteria (11 sequences) and Planctomycetes (10 sequences). 
In addition, actinorhodopsins (7) and firmicute rhodopsins (4) were 
also found. 

Metagenomic assembly. Assembly of the metagenomes resulted in a 
total of 104 contigs from Mar Menor and 35 contigs from Albufera 
(See methods for details). Nearly one-third of all contigs (77%, 
n = 80) assembled from Mar Menor were primarily alphaproteo- 
bacterial (average GC 51.7%, average length 3.1 kb, total length 
250 kb). The only other significant sized fraction of assembled 
contigs could be assigned to viruses (12%, n = 8, total length = 
34 kb). A small number of actinobacterial contigs (n=6, average 
length 2.8 kb, total length 17 kb) could also be assembled. Five of 
these contigs were high GC (57 to 60%) while the last contig, (size 
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Figure 6 | PCA of tetranucleotide frequencies of actinobacterial contigs from Mar Menor. For reference, actinobacterial contigs from Lake Gatun, 
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4.3 kb) had a much lower GC content (43%) and contained, among 
some hypothetical genes, the genes coding for the alpha and beta 
subunits of ribonucleotide reductase, that are crucial for conversion 
of ribonucleotides to deoxyribonucleotides. 

We performed a principal component analysis on the tetranucleo- 
tide frequencies of the assembled contigs (see methods) (Figure 4). In 
this analysis it is possible observe, besides the actinobacterial cluster 
formed by the 5 (high GC) of the 6 Actinobacterial contigs (shown in 
yellow), 4 other clusters corresponding to cyanobacteria, gammapro- 
teobacteria, alphaproteobacteria and viral contigs. However, the largest 
cluster is formed by the alphaproteobacterial contigs. But this cluster 
has no proximity to the reference genomes of two organisms of the 
SARI 1 cluster (found by recruitment) namely, Candidatus Pelagibacter 
(GC=29.7%) of the SAR11 cluster and Alphaproteobacterium 
HIMB114, but instead is closer to Candidatus Puniceispirillum ma- 
rinum(GC=48.9%), which is a member of the SARI 16 cluster. A total 
of 320 genes were predicted in the 80 alphaproteobacterial contigs, and 
of these 120 genes gave a best hit to Rhizobiales (mean similarity 
72.38%), while 108 genes gave best blast hits to Rhodobacterales (mean 
similarity 73%). It did not appear to be related to Rickettsiales. So even 
though it appears that there are at least two unidentified microbes in 
Mar Menor, by 16S rRNA analysis, one related to Rickettsiales, SAR11 
and the other to Rhodobacterales, we did not assemble any reads from 
the SARI 1 related microbe, but from the other. 

In one of these assembled alphaproteobacterial contigs, we 
identified a nearly complete cluster of Sox genes that provide 
the necessary apparatus for performing sulfur oxidation. This 
cluster has been demonstrated to operate in photo- and chemo- 



trophic Alphaproteobacteria that oxidize thiosulfate to sulfate 
without inorganic sulfur globule formation as free intermediate, 
and was first described in the alphaproteobacterium Paracoccus 
pantotrophus, a facultative lithoautotrophic organism that grows 
with thiosulfate (and other electron donors e.g. molecular hydro- 
gen) as an energy source 82 . The cluster of P. pantotrophus coding 
for sulfur- oxidizing proteins comprises at least two transcriptional 
units with 15 genes. Seven genes, soxXYZABCD, code for proteins 
essential for constituting a periplasmic system for sulfur oxidation 
in vitro and are induced by thiosulfate. The SoxY gene has a C- 
terminal invariant binding site motif (VKVTIGGCGG), that binds 
different oxidation states of sulfur 83 . The exact motif was present 
in the assembled SoxY gene in the assembled contig, providing 
more confidence in the function assignment and assembly. 
Although several pathways of thiosulfate oxidation are known, 
two main pathways exist, the difference between them being 
related directly to the presence or absence of the SoxCD genes 84 . 
In the presence of SoxCD proteins, thiosulfate is converted to two 
sulfate molecules, and this is the pathway in P. pantotrophus, 
while in the absence of SoxCD, only a single sulfate is produced, 
the other sulfur atom being deposited in the form of inorganic 
globules (e.g. Beggiatoa). In the case of the assembled contigs, it 
clearly possesses a SoxC gene, while the SoxD part is likely not 
assembled. So it appears that the organism to which this cluster 
belongs is able to fully oxidize sulfur to two sulfates and does not 
deposit any sulfur granules either intra or extracellularly. 

Comparison of the assembled Sox genes cluster with the sox 
gene cluster of Roseobacter sp. MED193 and Aurantimonas 
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manganoxydans SI85-9A1 (Supplementary Figure S12) showed 
nearly complete synteny between the genomic regions and the 
assembled contig. This suggests that the organism to whom these 
contigs belong is a novel sulfur-oxidising Alphaproteobacteria, likely 
adapted to a higher salinity. This is interesting because the close 
relatives of this microbe, e.g. Candidatus Pelagibacter, 
Alphaproteobacterium HIMB114 and Candidatus Puniceispirillum 
do not have the Sox cluster in their genomes and are likely incapable 
of sulfur oxidation. 

Some other contigs that were assembled from the data for Mar 
Menor could be assigned to cyanobacteria (Figure 4). These contigs 
appeared closely related to Synechococcus species. However, the 
assembly from Albufera represented a much more diverse set of 
contigs, with several contigs assembled, primarily from cyanobac- 
teria (66%) but also from other taxa (Viruses 9%, Bacteroidetes 11% 
and Betaproteobacteria 6%). 

A more focused analysis of the assembled actinobacterial contigs 
from Mar Menor, in the context of actinobacterial contigs from other 
metagenomic datasets is shown in Figure 6. We collected actinobac- 
terial contigs from Lake Gatun and Punta Cormoran (see methods), 
and also three fully sequenced actinobacterial fosmids from Lake 
Kinneret 85 . We could identify at least six distinct clusters, each repre- 
senting a dominant lineage of freshwater actinobacteria. Because of 
the presence of 16S sequences in the contigs, at least three of the 
clusters can be assigned a tentative name, i.e. two sub groups of acl, 
acl A and acIB 1 , and a lineage acIV. In comparison to the Lake Gatun, 
the contigs from Punta Cormoran, appear to have higher GC content 
(see Clusters 4, 5 and 6 in Figure 6). Also, out of the 6 assembled 
contigs from Mar Menor, five cluster very clearly with Punta 
Cormoran contigs in Cluster 5 (GC% 55-62). Only a single contig, 
a low GC contig, clusters with the acIBl cluster (Cluster 1). So it 
does appear that there is a minor low GC actinobacterial popula- 
tion in Mar Menor (also seen in the GC% profile of Mar Menor 
Actinobacterial Reads, Supplementary Figure S10). No rRNA 
sequences were detected either in the contigs from Punta Cor- 
moran, or from Mar Menor contigs so assignment of names to 
clusters 4, 5 or 6 was not possible. Also, since a number of different 
actinobacterial clades are nearly equally abundant (Figure 5), it is not 
possible to associate with any degree of confidence these contigs with 
the known actinobacterial lineages. 

Discussion 

In this work, we have compared the relative relevance of the different 
groups of microorganisms in two coastal lagoons, one freshwater and 
another hypersaline, with other related aquatic systems, in the frame- 
work of the main environmental features characterizing them. The 
analysis of the metagenomic and metacommunity data of these two 
hypertrophic lagoons has revealed interesting general patterns. We 
have discovered, using assembly of the metagenomic data, a novel, as 
yet uncultured, sulfur oxidizing alphaproteobacterium, that is 
abundant in the hypersaline Mar Menor. We also found evidence 
of the presence of only Synechococcus as the abundant cyanobacteria 
and the complete absence of Prochlorococcus, which is abundant in 
the parent Mediterranean water body from which Mar Menor waters 
are derived. Also, even the freshwaters of Albufera, though abundant 
in cyanobacteria, did not show any indication of presence of Pro- 
chlorococcus. Microscopy and sequence data of the phytoplankton 
revealed differences in the two lagoons, Mar Menor dominated by 
dinoflagellates and Albufera by chlorophytes, while diatoms were 
observed in both. The main distinctive characteristic of Albufera is 
its highly hypertrophic status. It contained a considerably different 
microbiota than less nutrient rich freshwaters. Importantly, canon- 
ical freshwater microbial groups like low GC Actinobacteria, LD12 
lineage of Alphaproteobacteria and even the cosmopolitan betapro- 
teobacteria, Polynucleobacter, are all conspicuously absent. That 
cyanobacteria are a major component of hypertrophic waters (like 



Albufera) is not new, however, the absence of the other major fresh- 
water microbes certainly is a significant departure from other fresh- 
water systems. Many of the groups that are absent in Albufera such as 
the low GC Actinobacteria or the LD12 lineage are very small sized 
bacteria, with a high surface to volume ratio, which might be of 
advantage in low nutrient situations. However, we speculate that this 
competitive edge is likely to be lost in a hypertrophic situation like the 
one in Albufera, where fast growing cyanobacteria might become 
dominant. 

Methods 

Sampling. Samples were collected from Albufera de Valencia on May 12, 2010 
(39°19'54"N, 0°21'8"W) and on May 7, 2010 from Mar Menor (37°43'08"N, 
00 ,J 47'14"W) as part of the J. Craig Venter Institute European Sampling Expedition. 
Approximately 20 L (Albufera de Valencia) and 40 L (Mar Menor) were sequentially 
filtered using three different filter sizes (0.1 um, 0.8 um and 3 um). Filters were stored 
at — 80°C in protective buffer (10 mL sterile filtered sampling water, 10 mL 
RNAlater, 200 ul lOOx TE buffer, 400 ul 0.5 M EGTA and 400 ul 0.5 M EDTA) until 
DNA extraction. Then filters were thawed on ice and then treated with 1 mg/ml 
lysozyme and 0.2 mg/ml proteinase K (final concentrations). Nucleic acids were 
extracted with phenol/chloroform/isoamyl alcohol and chloroform/isoamyl alcohol 
and DNA integrity was checked by agarose gel electrophoresis. Samples were 
sequenced using the Roche 454 GS-FLX system, titanium chemistry. The raw data for 
all samples has been deposited in the Camera Database (Community 
Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis, http:// 
camera.calit2.net) and is publicly available. 

Analytical methods: water chemistry, pigment analyses by HPLC, and 
microscopic counts. Using replicated water samples obtained simultaneously to 
those used for metagenomic analyses, a series of physical, chemical, and biological 
determinations were done. Electrical conductivity was measured with a WTW LF- 1 9 1 
conductivity meter, and pH with a Orion electrode ion Analyzer EA920 (Orion 
Research). Soluble reactive phosphorus was determined by the phosphomolibdic 
ascorbic acid method, nitrate was analyzed after reduction in a cadmium column by 
the Griess method, both from in situ filtered (through GF/F glass-fiber filters) 
samples. Alkalinity was measured by titration with HC1 and chloride was determined 
by the argentimetric method. Analyses were performed following Standard Methods 
for Water Analyses. Ammonium was also measured on filtered water using the 
modified indophenol blue method. Unfiltered samples for total phosphorus and total 
nitrogen determinations were digested through a double alkaline-acid persulfatic 
digestion. Once extracted, total phosphorus was determined as SRP after pH- 
neutralization. Total nitrogen was also determined on the digested samples following 
Bachmann and Canfield 86 . Carbon forms (C0 3 , C0 3 H and C0 2 ) were calculated 
from pH and alkalinity measurements following Rodier 87 . Chromophoric dissolved 
organic matter (CDOM) was quantify by means of the excitation-emission matrix 
(EEM) method 88 using a F-7000 Hitachi fluorescence spectrophotometer. 

Phytoplankton abundance from Lugol- fixed samples was determined with an 
Olympus 1X50 inverted microscope by using the Utermohl sedimentation method 89 . 
Algae were identified according to several described taxonomic keys 90-93 . Samples for 
photosynthetic pigments determination were collected onto GF/F filters and 
extracted in the dark, overnight at — 20 L C, with 100 % acetone with several sonication 
times, samples were injected into a Waters HPLC system with a Waters 996 pho- 
todiode Array Detector. The system uses two columns (Spherisorb S5 ODS2) in series 
and running at 35°C in a mefhanol/ammonium acetate/acetone gradient following 
Pinckney et al 94 and Van Heukelen et al. 95 . Peaks were identified according to their 
absorption spectra and concentration was calculated using commercial standards 
(DHI, Denmark). 

For the cytometric identification and quantification of the bacterioplankton and 
APP cells, a Coulter Cytomics FC500 flow cytometer equipped with an argon laser 
(488 excitation), a red emitting diode (635 excitation), and five filters for fluorescent 
emission (FL1-FL5), was used. Bacterioplankton abundance was determined with 
argon laser by green fluorescence (Sybr Green I) using FL1 detector (525 nm). APP 
abundance was determined combining argon laser and red diode by red fluorescence 
(Chlorophyll-a and phycobiliproteins autofluorescence) using FL4 detector 
(675 nm). 

Community Structure. 16S Ribosomal RNA genes were identified by comparing the 
datasets against the RDP database 37 . All reads that matched an rRNA sequence with 
an alignment length of more than 100 bases and an e-value of 0.001 against the 
database were extracted. The best hit was used to assign to a high taxonomic level. 
When possible, the sequences were further assigned to genus if they shared ^95% 
rRNA sequence identity with a known species. Moreover, the 16S sequences were also 
run through the Metaxa program 96 to cross-check identified genera. Additionally, the 
entire datasets were compared to the NCBI NR database (using BLASTX, le-5) and 
analysed using the MEGAN software 97 . Classification of 16S rRNA reads into specific 
reference taxonomies was performed using mofhur 98 . 

Assembly of the metagenomic reads (only reads >100 bp) was performed using a 
stringent criteria of overlap of at least 80 bp of the read and 99% identity and at most a 
single gap in the alignment (using Geneious Pro 5.4). Assembled contigs that were less 
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than 2 kb in length, and those with less than three predicted genes were discarded. We 
retained only those contigs that gave consistent hits to only a single high level taxon 
(e.g. Alphaproteobacteria, Euryarchaeota, Bacteroidetes, Actinobacteria). The strict 
assembly requirements combined with a taxonomic uniformity condition imposed 
on the assembled sequence resulted in a total of 88 contigs that were more than 5 kb in 
length and had a consistent phylogenetic profile and were hence more likely to 
originate from a single organism. 
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